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Abstract 

While probabilistic forecast verification for categorical forecasts is well established, some of 
the existing concepts and methods have not found their equivalent for the case of continuous 
variables. New tools dedicated to the assessment of forecast discrimination ability and forecast 
value are introduced here, based on quantile forecasts being the base product for the continuous 
case (hence in a nonparametric framework). The relative user characteristic (RUG) curve and 
the quantile value plot allow analysing the performance of a forecast for a specific user in a 
decision-making framework. The RUG curve is designed as a user-based discrimination tool 
and the quantile value plot translates forecast discrimination ability in terms of economic value. 
The relationship between the overall value of a quantile forecast and the respective quantile 
skill score is also discussed. The application of these new verification approaches and tools is 
illustrated based on synthetic datasets, as well as for the case of global radiation forecasts from 
the high resolution ensemble GOSMO-DE-EPS of the German Weather Service. 


1 Introduction 

Verification of probabilistic weather forecasts is an area of intensive research and growing inter¬ 
est as ensemble forecasting is becoming a standard approach in numerical weather prediction. 
Ensemble prediction systems (EPS) issue a sample of possible future states of the atmosphere 
(Lewis, 2005; Leutbecher and Palmer, 2008). The forecasts can be interpreted in the form of a 
predictive distribution and probabilistic products can be derived in order to support and optimize 
forecast-based decision-making (Krzysztofowicz, 1983). Appropriate tools for the assessment of 
probabilistic products from this perspective are therefore essential. 

Such tools already exist for probabilistic products expressed in the form a probability forecast. 
The relative operating characteristic (ROC) curve is a common verification tool for the assessment 
of probability forecasts (Mason, 1982). The ROC curve is related to decision-making analysis 
and the corresponding fundamental property of the forecast is called discrimination. Eorecast 
discrimination assesses whether the forecast can be used to successfully discriminate between the 
observations (Murphy, 1991) or, said differently, whether appropriate decisions can be taken based 
on a forecast. Discrimination is translated in terms of economic value using a simple cost-loss 
model that allows the specificity of a user to be taken into account through the dehnition of a cost- 
loss ratio. The derived quantitative measure is called value score or relative value and is usually 
represented in the form of a probability value plot showing the forecast value as a function of the 
user’s cost-loss ratio (Richardson, 2000; Wilks, 2001; Zhu et at, 2002). The value of a forecast 
is defined as the benefit to a user as a result of making decisions based on a forecast and has 
to be distinguished from forecast quality, the overall agreement between forecast and observation 
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(Murphy, 1993). In a verification process, value and quality can be seen as being from the point 
of view of the forecast user and from the point of view of the forecast provider, respectively. 
The distinction between the two types of goodness, value and quality, is crucial since a non-linear 
relationship between them can lead to situations where a large improvement in the forecast quality 
does not imply an increase in the forecast value, or conversely, a small improvement in forecast 
quality can bring a notable benefit in terms of forecast value (Chen et al, 1987; Buizza, 2001; 
Pinson, 2013). 

Probabilistic products can be expressed in terms of a probability when the focus is on a particular 
event of interest, but also in terms of a quantile when the focus is on a particular probability level of 
interest. While a probability forecast first requires the definition of an event, i.e. the categorization 
of the original information, a quantile forecast is a ’single-valued’ forecast expressed in the unit of 
the variable being forecast. Considering here probabilistic products derived from EPS simulations 
for continuous variables, such as temperature, wind speed or global radiation, quantile forecasts 
allow one to work with a continuous forecast as the original one by defining a nominal probability 
level. The choice of a probability level is directly related to the user’s loss function; a quantile 
forecast at a given probability level is the optimal forecast for users with a specific asymmetry in 
their loss function (Koenker and Machado, 1999; Friederichs and Hense, 2007; Gneiting, 2011a). 
Based on the relationship between user’s loss function and quantile forecast level, the quantile score 
(QS) is the natural scoring rule for assessing the quality of quantile forecasts (Koenker and Machado, 
1999; Friederichs and Hense, 2007; Gneiting, 2011a). More recently, the verification of quantile 
forecasts has benefited from the tradition and concepts stemming from the probability forecast 
verification framework. It has been shown that QS is a proper scoring rule and a decomposition 
of the score has been proposed (Bentzien and Friederichs, 2014). The QS decomposition provides 
information about reliability and resolution, two other fundamental attributes of a probabilistic 
forecast (Toth et ai, 2003). 

The aim of the paper at hand is to extend the range of verification methods dedicated to the 
assessment of quantile forecasts. In particular, the assessment of quantile forecasts from the 
user’s perspective, in a decision-making framework, is explored here. Based on a simple cost- 
loss model, the concepts of forecast discrimination and forecast value are revisited focusing on a 
specific user rather than on an specihc event. First, a new tool is proposed for the analysis of user- 
based discrimination. The so-called relative user characteristic (RUC) curve and the associated 
summary measure are shown to be adequate for the assessment of quantile forecast discrimination 
ability. Secondly, quantile forecast value is discussed as an application of the value score to quantile 
forecasts. The quantile value plot, showing the economic value of a forecast as a function of a range 
of events of interest, is proposed as a new tool for the visualization of quantile forecast performance. 
Finally, the relationship between quantile forecast value and quantile skill score is discussed in the 
same vein as the relationship between probability forecast value and Brier skill score (Murphy, 
1969). The concepts developed are first illustrated with the help of synthetic datasets and in a 
second step applied to probabilistic forecasts derived from an EPS. 

The manuscript is organized as follows: Section 2 describes the datasets that are used to illustrate 
the discussion. Section 3 introduces definitions and notations and describes the relationship be¬ 
tween quantile forecast and forecast user within a cost-loss model framework. Section 4 discusses 
the concept of discrimination and Section 5 the application of the economic value score to quantile 
forecasts. Section 6 presents the conclusions. 
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2 Data 


2.1 Synthetic datasets 

In order to illustrate the concepts discussed hereafter, we make use of synthetic and real datasets. 

The synthetic data are derived from a toy-model based on normal distributions often used to 
illustrate verification discussions (e.g. Hamill, 2001; Weigel, 2011). The toy-model is kept simple 
in order to facilitate the interpretation of the results. 

We consider a signal s, normally distributed, written s ~ AA(0,1). We assume that the observations 
are randomly drawn from a distribution 1) and the associated predictive distribution described 
by J\f(s + /3, a) where /3 is the unconditional bias parameter and a the dispersion parameter. We 
define the following test-cases: 

Aq : /3 = 0, a = 1 (a perfect probabilistic forecast) , 

Ai : P = —0.75, (T = 1 (a biased forecast), 

A 2 : /3 = 0, 17 = 1/3 (an underdispersive forecast), 

B : /3 = eB,<7 = l(a forecast with white noise), 

where eb is derived from a uniform distribution defined on ] — 5,5[. The first three datasets ^Oj 
Ai and A 2 differ only in terms of biases while the fourth dataset B corresponds to a forecast with 
a dynamically disturbed signal. 

2.2 COSMO-DE-EPS 

Real datasets are provided by COSMO-DE-EPS, a regional ensemble prediction system run op¬ 
erationally at Deutscher Wetterdienst, Offenbach, Germany. The ensemble system is based on a 
2.8 km grid resolution version of the COSMO model (Steppeler et al, 2003; Baldauf et al, 2011) 
with a model domain that covers Germany and parts of the neighbouring countries. The ensem¬ 
ble comprises 20 members including variations in initial conditions, physics parameterisations and 
boundary conditions (Gebhardt et al, 2011; Peralta et al, 2012). 

COSMO-DE-EPS has been first developed focusing on high-impact weather events (Ben Bouallegue et al., 
2013; Ben Bouallegue and Theis, 2014) and is planned to be used for energy-applications. The fo¬ 
cus in this paper is on global radiation which is the main weather variable affecting solar energy fore¬ 
casts. Verification is applied to the 0300UTC run with a forecast horizon ranging between 5 and 15 
hours. Two periods of 3 months are compared: winter (December, January, February) 2012/2013 
and summer (June, July, August) 2013. The observation dataset consists of pyranometer mea¬ 
surements from 32 stations distributed over Germany and quality controlled (Becker and Behrens, 

2012 ). 

Global radiation forecasts and observations are transformed into clearness index before verification. 

The clearness index is defined as the ratio between global radiation at ground and global radiation 
at the top of the atmosphere (Badescu, 2008). This pre-processing of the data allows climatological 
effects and misinterpretation of the verification results to be avoided (Hamill and Juras, 2006). 

3 Definitions and framework 

3.1 Quantile forecast, quantile score, and quantile skill score 

We first consider the quantity to be forecast (or observation) D € iR that we assume to be a 
continuous random variable driven by a stochastic process. An observed event E is defined by a 
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threshold uj as E : Q > oj. The base rate vr of an event E (or climatological frequency) corresponds 
to: 

TT = Pr{Q, > ui). (1) 

Consider now a predictive cumulative distribution F(x). The probability forecast p^] of event E is 
defined as: 

p^ = l- E{u:). (2) 

The quantile forecast Qr at probability level r (0 < r < 1) is defined as: 

qr ■■= =mi{y : F{x)>t} (3) 

such the relationship between a probability forecast and a quantile forecast is expressed as: 


Pq. = 1 - T- (4) 

Figure 1 shows an example of a cumulative distribution function F{x). A threshold oj and the 
associated probability forecast 1 — as well as a probability level r and the associated quantile 
forecast qr are shown on the plot. 

The quantile score (QS) is the scoring rule applied in order to assess the quality of a quantile 
forecast. QS is based on an asymmetric piecewise linear function called the check function. The 
check function was first defined in the context of quantile regression (Koenker and Bassett, 1978): 


Pt{u) = u[t — I{u < 0)] 


Tu if u > 0 
(r — l)u if n < 0 


(5) 


where /(.) is an indicator function having value 1 if the condition in parenthesis is true and zero 
otherwise. QS results from the mean of the check function applied to the pairs i = of 

observation and quantile forecast q-j-^i following 


QS 


N 


N 


'y ^ PriQ'i qT,i)i 


2 = 1 


( 6 ) 


where N is the size of the verification sample. Developing Eq. (6) we can write 


(7) 


The scoring rule consists of penalties per unit 1 — r and r associated with under-forecasting and 
over-forecasting, respectively. 

Skill scores are computed in order to measure the relative benefit of using a forecast compared to 
a reference forecast (Wilks, 2006). The quantile skill score (QSS) measures the skill of a quantile 
forecast compared to a reference quantile forecast. Considering the climatology as reference, QSS 
corresponds to: 


QSS 


Q'S'forecast Q'S'climate _ ^ Q'S'forecast 

Q'S'perfect Q'S'dimate Q'S'climate 


( 8 ) 


where QSforecast, Q'S'perfect and QS'ciimate represent the quantile scores of the forecast under assess¬ 
ment, of a perfect deterministic forecast and of a climatological r-quantile forecast, respectively. 
Q5perfect) by definition, equals 0 and a climatological r-quantile forecast, noted D,-, is here defined 
as the r-quantile of the observation distribution over the verification sample. 
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Figure 1: Example of a predictive cumulative distribution function F{x). Probabilistic products 
are derived either fixing a threshold oj and deriving the associated probability forecast or fixing 
a probability level r and deriving the associated quantile forecast qr- 

3.2 Cost-loss model and optimal decision-making 

The framework used to discuss the concept of user and decision-making is based on a static cost- 
loss model (Thompson, 1962; Katz and Murphy, 1997). The cost-loss model describes situations 
of dichotomous decisions: a user has to decide whether or not to take protective action against 
potential occurrence of an event E. The decision is made based on a decision variable (or forecast) 
A. A decision criterion A applied to the decision variable dehnes an action A : A > A. Taking 
action implies a cost C. In the case of occurrence of the event E without preventive action, a loss 
L is encountered. The cost-loss ratio is denoted a: 



A user with cost-loss ratio a is called hereafter an a-user. Based on this simple model the optimal 
decision strategy of an a-user can be discussed (e.g. Richardson, 2011). The problem consists of 
finding, for a decision variable A, the critical decision criterion Aq, that minimizes the a-user mean 
expense if actions are taken when A > Aq,. 

Consider first the case of a probability forecast as a decision variable. Based on p^, does the 
user have to take action or not? In order to answer this question, the average expenses in the cases 
of positive and negative answers are compared. If the answer is yes, the user encounters a cost C 
on every occasion, so the average expense Eyes is simply 

Eyes = C. ( 10 ) 

If the answer is no, the user has no cost but a loss L on each occasion where the event occurs, so 
on average the user’s expense E^o is 


Eno = LPr{n > uj I p^), (11) 

where Pr{Q > u \ Puj) is the probability that the event occurs when the probability forecast Pi^ is 
issued. So, users with a cost-loss ratio a < Pr{id > uo \ p^^) should take preventive action, while 
users with a greater cost-loss ratio should not. The critical decision criterion p* associated with 
the decision variable p^j is thus dehned as 

pI = I Pr(0 >uj\pu,) = a}. (12) 
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Thus, the action based on the probability forecast A ■. optimizes the user’s mean expense 

in the long term. 

If the forecast is reliable, we have by definition Pr(fl > oj \ p^) = p^^: the event actually happens 
with an observed relative frequency consistent with the forecast probability (Brocker, 2009). The 
optimal decision is then to take action if 


Pui > a. (13) 

When the probability forecast is compared to the cost-loss ratio in order to decide whether or 
not to take action (without additional information about forecast reliability), we say that the 
probability forecast is taken at face value. For example, consider users who have to decide whether 
or not to take preventive action against precipitation occurrence. If the forecast probability of 
precipitation is 10%, users with cost-loss ratio lower than 10% take action. If the forecast is not 
reliable, the critical decision criterion is no longer a but has to be adjusted following Eq. (12). 
Statistical adjustments of the forecast based on past data is usually referred as forecast calibration 
(e.g. Gneiting et al, 2007). 

Consider now a quantile forecast Qr as a decision variable. We apply the same reasoning as for a 
probability forecast. The critical decision criterion q* associated with Qt is defined as 

q* = {qr I Pr{n > ui \ qr) = a} (14) 

such that taking action when q^- > q* minimizes the user mean expense. By dehnition, a quantile 
forecast is reliable if it satishes 


Pr{il > oj \ qr = u)) = 1 — T, (15) 

i.e. the observed relative frequency of the event dehned by the quantile forecast is consistent with 
the quantile forecast probability level. Eq. (14) has a straightforward solution 

q*=uj (16) 

when the decision variable is the quantile forecast at probability level r defined as 

r = 1 — a. (17) 

Taking action when qr > co with r = 1 — a is equivalent to taking action when Pcj > a since the 
cumulative probability distribution function F{x) is by definition monotonically increasing (see e.g. 
Eigure 1). Hence, a quantile forecast is taken at face value when the user’s decision is made based 
on the comparison of the forecast with the event threshold u. In our example, if the 90%-quantile 
forecast of precipitation is greater than zero, a user with cost-loss ratio a = 1 — 0.9 = 0.1 takes 
preventive action. 

In a general form, the critical decision criterion for an a-user is dehned by 

Aa = {A I Pr{Vt > cj I A) = a} (18) 

where the decision variable could equally be the probability forecast pu) or the quantile forecast qr 
with r = 1 — a. Provided that the forecasts are reliable, the critical decision criteria are known and 
have a simple expression (Eqs (13,16)). In the following, we say that the decision variable is taken 
at face value when the user applies the decision criterion valid for a reliable forecast, irrespective 
of whether the forecast is actually reliable or not. 
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Figure 2: (a) Cost (dashed line) as a function of the level of protection x and loss (full line) as a 

fnnction of the observation fl. An observation flj is represented by a vertical line, (b) Expense as a 
fnnction of the difference between the observation flj and the level of protection x. The horizontal 
line indicates the expense for a perfect level of protection. 


3.3 Quantile forecast user 


The dichotomous decision problem is extended to a continuons decision problem considering the 
cost C and the loss L as unitary cost and unitary loss, respectively (Epstein, 1969; Roulston et al, 
2003). The cost of taking protection is a linear function of the level of protection x and the 
loss without protection is a linear function of the observation fl, as illustrated in Figure 2. The 
optimization problem consists of finding the level of protection that minimizes the expected user 
expense. 

Considering a variable defined on 3?+ (the generalization to variables dehned on 5ft is straightfor¬ 
ward), the expense associated with a level of protection x corresponds to Cx. If the observation is 
n, then protection is perfect if x = 11. But if x > H, then there is an unnecessary expense due to a 
larger level of protection than is actually needed. If the observation H is greater than the level of 
protection, then additionally a loss L(n — x) is encountered. Formally, we can write the expense 
function E as 


E = 


C{x — ^l) if H < X 

(L — C')(n — x) if H > X. 


(19) 


The expense function is represented in Figure 2. If divided by L, the expense function is an 
asymmetric loss function equivalent to the check function dehned in Eq. (5), where the asymmetry 
is given by r = ■ Thus the optimal level of protection x* which minimizes the user’s mean 

expense corresponds to the 1 — a quantile of the true predictive distribution of H. 

This resnlt is not new: quantile forecasts arise as an optimal solution for nsers with an asym¬ 
metric linear loss function (Koenker and Bassett, 1978; Christoffersen and Diebold, 1997). More 
recently, it has been shown that quantile forecasts are optimal forecasts in a stochastic optimization 
framework for a more general class of loss functions (Gneiting, 2011b). 

Asymmetric loss fnnctions hnd a number of applications, in particular for operational decision¬ 
making problems related to the integration of renewable energies into the electricity grid. Eor 
example, asymmetric loss functions can be associated with market participants who want to op¬ 
timize their bids or system operators who have to optimize their reserves. The user’s optimal 
forecast corresponds then to a specihc quantile of the predictive distribntion where the probability 
is dehned by the user’s cost-loss ratio (Pinson et al, 2007; Pinson, 2013). 
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4 Discrimination 


Based on the discussion developed in the previous Section, continuous decision making is seen in 
the following as a continuum of dichotomous decisions. For each threshold oj of the event spectrum, 
the question is whether to take action for the next unit of the variable. The adequate decision 
for a user in order to minimize the expected expense is a function of his (her) cost-loss ratio as 
defined in Eq. (18). Moreover, the relationship between cost-loss ratio and quantile probability 
level, T = 1 — a, makes implicit the cost-loss ratio a of a user as soon as the level r of the quantile 
forecast used as decision variable is selected. 

4.1 General verification framework 

A general framework for forecast verification is based on the joint distribution of forecasts and 
observations (Murphy and Winkler, 1987). The overall agreement between forecasts and observa¬ 
tions is called quality and is measured by scoring rules, like QS for quantile forecasts. In order to 
access more information about the forecast performance, two factorizations of the joint distribu¬ 
tion, into conditional and marginal distributions, can be applied: the calibration-refinement (CR) 
factorization when conditioning on the forecasts and the likelihood-base rate (LBR) factorization 
when conditioning on the observations. Summary measures based on these two factorizations are 
associated with attributes, fundamental characteristics of the forecast. Reliability and resolution 
are derived from the CR factorization while discrimination is derived from the LBR factorization 
(Murphy and Winkler, 1992). 

Here the focus is on discrimination, the key forecast attribute for decision-making processes. A 
general definition of discrimination is ’’the ability of a forecasting system to produce different 
forecasts for those occasions having different realized outcomes” (Wilks, 2006). Discrimination 
assessment is discussed in terms of event and action within the dichotomous decision framework. 
Regarding the LBR factorization, it is common practice to analyse discrimination in terms of hit 
rate H and false alarm rate F defined as 


H = Pr{A >X\n>uj) 

(20) 

F = Pr{A > A 1 0 < w). 

(21) 


respectively. Actions A : A > A and events E : Q > uj are dichotomous, each presenting two 
alternatives, so H and F can be easily derived from the construction of a 2 x 2 contingency table. 
No discrimination corresponds to the case where: 


H = F (22) 

for all A G A and cu G D, meaning that actions and event occurrence are independent (Brocker, 
2014). 

4.2 Event-based discrimination 

We first focus on one particular event defined by a threshold w, with event-specific hit rate Hx and 
false alarm rate A popular way to assess discrimination (Eq. (22)) is to plot the set of points 
(Fx, Hx) for a range of actions with A G A. The resulting curve is known as the relative operating 
characteristic (ROC) curve. When action and event occurrence are independent, the ROC curve 
is a diagonal line. Concavity of the curve indicates a discrimination ability in the forecast and the 
area under the curve (AUC) becomes a quantitative measure of forecast discrimination (Mason, 
1982). Figure 3 (a) shows an example of a ROC curve for the synthetic dataset Aq. The event of 


interest is E : Q > 0 with a base rate tt = Pr{n > 0) of 0.5. The respective forecast probability 
Po = 1 — F{0) is used as decision variable. 

The interpretation of the ROC curve can be related to the dichotomous decision model described 
in Section 3.2 as discussed for example in Richardson (2011). In order to describe this relationship, 
we consider the slope of the ROC curve, defining first the gradient of a line joining two successive 
ROC points {Fx,Hx) and (T\+aa, ^^a+aa): 

Hx - Hx+ax PriA >X\n>uj)- Pr{A >X + AX\n>uj) 

Fx - Px+Ax Pr{A >X\n<u)- Pr{A > A + AA | O < w)' ^ ^ 

The slope of the curve 7 is obtained when AA tends to 0: 


7(A,w) 


Pr{A = A I n > a;) 
Pr{A = A I n < a;) 


(24) 


where the ratio is also know as the likelihood ratio (Brocker, 2011). Using the Bayes rule and the 
definition of the critical decision criterion of an a-user in Eq. (18), we can write 


7 (A«,w) 


1 — TT a 
TT 1 — a 


(25) 


where vr = Pr{Q > ui) is the base rate of an event E : Q. > uj and A^ the corresponding critical 
decision criterion of an a-user. 

The range of decision criterion A used to derive the ROC curve {Fx,Hx) corresponds to a range 
of critical decision criteria associated with users with different cost-loss ratios. Each point of the 
ROC curve is associated with a specific a-user that is identified by the slope of the curve at that 
point. The slope possibly ranges between 0 and -|-oo at the right-top and the bottom-left corners of 
the ROC plot respectively. Moving along the curve from the top to the bottom consists in varying 
the cost-loss ratio a between 0 and 1 . 

For example, consider a user with a cost-loss ratio a = 50%. In Figure 3, the point of the ROC 
curve with slope 7 = 1 is highlighted (a = 0.5, tt = 0.5 in Eq. (25)). This point indicates the 
performance of the forecast in terms of H and F for this particular user. Conversely, the decision 
criterion applied to obtain this point corresponds to the critical decision criterion for the 50%-user. 
The ROC curve applied to a decision variable, then, corresponds to testing whether actions and 
event occurrence are independent for one event and a range of users with different cost-loss ratios. 
The ROC curve is an event specific but user unspecific discrimination tool and is therefore well- 
adapted to probability forecast discrimination assessment. 


4.3 User-based discrimination 

We focus now on a user with cost-loss ratio a. The critical decision criterion A^ defines the action 
of this specific user with respect to an event. We define then the user-specific hit rate and 
false alarm rate F^^ as in Eqs (20) and (21) for a fixed a. In order to test Eq. (22), the set 
of points (ELj^^oj) are plotted for a range of events. We call the resulting curve a relative user 
characteristic (RUC) curve because it is a comparison of two user characteristics {F^^ and as 
the event definition varies. As for the ROC curve, the no discrimination line corresponds to the 
diagonal line and concavity of the curve indicates forecast discrimination ability. 

Figure 3 (b) shows an example of a RUC curve valid for a user with cost-loss ratio a = 50%. 
In this example, the decision variable is the 50%-quantile forecast from the synthetic dataset Aq. 
Moving along the RUC curve from the bottom left corner to the top right corner involves varying 
the event under focus, the event’s base rate varying from 0 to 1, respectively. The point with slope 
7=1 corresponds to the event E : U > 0 with base rate tt = 0.5. This point is obviously the same 
as in Figure 3 (a). 
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Figure 3: Discrimination curves for decision variables from the synthetic dataset ^o- The diagonal 
lines are the no discrimination lines. The points correspond to the {F,H) pair for the event D > 0 
and the action associated with the 50%-users. (a) ROC curve of the probability forecast po for the 
event E : > 0, with base rate tt = 0.5, and equi-cost lines (in grey) of slope 7 = 1. (b) RUC 

curve of the quantile forecast (/ 0.5 for the user with cost-loss ratio a = 0.5. 

In order to produce a RUC curve, critical decision criteria have to be known for a range of events. 
They can be estimated resolving Eq. (14) numerically. In practice, critical decision criteria can 
also be estimated by means of a reliability diagram. For example, a reliability diagram for quan¬ 
tile forecasts plots the conditional observed quantile as a function of quantile forecast categories 
(Bentzien and Friederichs, 2014). With regard to Eq. (15), we can deduce that the mean forecast 
in each forecast category (horizontal axis of the diagram) is an estimation of the critical deci¬ 
sion criteria associated with the events defined by the corresponding conditional observed quantile 
(vertical axis of the diagram). 

The RUC curve is user specific (and event unspecific) and therefore well-adapted to quantile fore¬ 
cast discrimination. A summary measure of quantile discrimination ability is obtained mimicking 
the ROC framework: the area under the RUC curve, noted here AUC, is proposed as a quan¬ 
titative measure of discrimination for quantile forecasts. Considering ue events Ei : Vt > oji, 
z = 1, ...,nE with increasing base rate, AUC is estimated by a trapezoidal approximation as 

riE 

AUC = ^ (26) 

1=0 

with the trivial points = 0 (for an event of base rate 0) and = 1 

(for an event of base rate 1). In order to reduce the biases introduced by the limited number of 
RUC points, the RUC curve can be fitted under a bi-normal assumption. The procedure involves 
considering F^^ and as both expressed as integrations of the standard normal distribution 
(Mason, 1982). The bi-normal model has been shown to be valid in most cases when applied in 
the ROC framework (Mason and Graham, 2002; Atger, 2004). 
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The properties of the RUC curve and AUC are discussed with the help of illustrative examples 
based on 4 simple simulation test cases (see Section 2.1). In Figure 4, the forecast attributes 
reliability, resolution and discrimination are shown as a function of the probability level r of the 
r-quantile forecast under assessment. RUC curves for the 50%-quantile forecasts are also shown. 
Quantile forecast reliability and resolution are estimated using the decomposition of the quantile 
score (Bentzien and Friederichs, 2014) while discrimination curves and summary measures are 
estimated based on the bi-normal assumption. 

Figures 4 (a) shows the lack of reliability, which occurs by construction in the simulations Ai, A 2 
and B. In Figures 4 (b) and 4 (c), resolution and discrimination measures deliver a similar message 
comparing the different simulations which illustrates the idea that ’’resolution and discrimination 
are the two faces of the same coin” (Brocker, 2014). Resolution and discrimination exhibit however 
different behaviours as a function of the probability level reflecting the fact that the first takes the 
forecaster’s perspective and the second the user’s perspective. Moreover, discrimination ability is 
identical for the simulations 2 I 2 : they are unaffected by biases and dispersion errors. 

Indeed, AUC is by construction insensitive to conditional and unconditional biases. In contrast, 
the forecast derived from simulation B with a perturbed signal presents less discrimination ability 
than forecasts from the other simulations, in particular for the 50%-quantile forecast. Focusing 
on users with cost-loss ratio a = 0.5 (r = 0.5) , RUC curves for the 50%-quantile forecasts of 
simulations Aq, Ai, A 2 , and B are shown in Figure 4 (d). The largest discrepancies between 
simulations A and B are visible at the centre of the RUC curves, so for events with intermediate 
base rates, while for events with small or large base rates the RUC curves tend to overlap. 



Figure 4: (a) Reliability, (b) resolution and (c) discrimination as a function of the probability 

level T of the r-quantile forecasts and (d) RUC curves for the 50%-quantile forecasts (r = 0.5). 
The results are shown for the simulation test cases Aq (full lines), Ai (dashed lines), A 2 (dotted 
lines) and B (full grey line). 


5 Value of quantile forecasts 

5.1 Economic value 


The cost-loss model described in Section 3.2 has been used to develop the concept of economic 
value of a probabilistic forecast. The forecast value is assessed considering decision-making made 
by an a-user about the occurrence of an event. The value of a forecast (also called value score or 
relative value) is defined as 


V = 


Eclimate ^forecast 




climate 


-B, 


perfect 


(27) 


II 








where the mean expense E of an a-user is estimated when decisions are based on a forecast 
(^forecast)) on a perfect deterministic forecast (-Eperfect), or on climatological information (EcUmate) 
(Richardson, 2000; Wilks, 2001; Zhu et al, 2002). R is a measure of the economic gain (or reduction 
of mean expense) when using a forecast relative to the gain when using a perfect deterministic 
forecast. 

Following e.g. Richardson (2011), the mean expense of a forecast user can be written as 

-^forecast = ^(1 “ '^)C - Htt{L - C) + TtL, (28) 

where H and E are the hit rate and false alarm rate as dehned in Eqs (20) and (21), respectively, 
and vr the base rate of the event of interest. A user with a perfect deterministic forecast at hand 
has to face costs only. The user mean expense corresponds in this case to: 


Eperfect — TtC*. 


(29) 


For a user who bases his (her) decision on climatological information, the optimal mean expense 
is expressed as 

Eclimate = { ^ (30) 

[ ttE n q: > tt, 

depending on the relationship between cost-loss ratio and base rate. Combining Eqs (28)-(30), the 
value of a forecast can hnally be written as: 


V = 


(1-E)- 
H - 


TT 


1 — TT 
1 — TT 

TT 


1 — a 

a 

a 

1 — a 


(1 — H) if a < TT 

F if a > TT. 


(31) 


So, the economic value V is defined for an event with base rate tt and a user with cost-loss ratio 
a. V depends on the forecast performance in terms of H and E. 

Applied to a probability forecast, the event’s base rate is fixed and the value of a probability 
forecast is generally represented in the form of a probability value plot showing F as a function of 
a. An example is provided in Fig. 5 (a), applied to simulation Aq considering the event E : to > 0. 
The forecast value curves are plotted for a range of probabilities as decision criterion, then the 
optimal values for each a-user (the upper envelope of the relative value curves) is selected to 
represent the value of the probabilistic forecast system (e.g. Richardson, 2000; Wilks, 2001). The 
probability value plot is related to the ROC framework since the pairs (E, E) of Eq. (31) are the 
ones used to draw the ROC curve. It has also been shown that the overall value of a probability 
forecast, considering all potential users, corresponds to the Brier skill score of the forecast if the 
distribution of cost-loss ratio is uniform over all users (Murphy, 1969; Richardson, 2011). 


5.2 Quantile value plot 

Applied to a quantile forecast, so focusing on a a-user, the value score is evaluated for a range 
of events of interest defined for example by their base rate vr. A new tool is therefore proposed 
for the assessment of quantile forecast performance: the quantile value plot which represents how 
V varies as a function of vr. This is illustrated in Figure 5 (b). The value of the 30%-quantile 
forecasts is plotted when the quantile forecasts derived from simulations Aq, Ai, A 2 , and B are 
taken at face value. Taking a quantile at face value means using it as it is, so for each event it 
implies considering the event threshold as decision criterion (see Section 3.2). An alternative is to 
apply the critical decision criteria, i.e. to use the (E, H) pairs from the RUG curve to estimate the 
value in Eq. (31). We talk then about potential value since it corresponds to the maximum value 
of the forecast, i.e. the maximum that could be potentially reached if an adequate calibration is 
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0.2 0.4 0.6 0.8 


Figure 5: (a) Value V of the probability forecast from simulation for fbe event defined 

as E : uj > 0 with base rate vr = 0.5. The dashed lines represent the forecast value when 
the probability levels 0.1,0.2,..., 0.9 are chosen as decision criterion. The full line represents the 
envelope of the dashed lines, (b) Value V for users with cost-loss ratio a = 0.7 of the 30%-quantile 
forecasts taken at face value from the 4 synthetic datasets: (foU black line, square), Ai (dashed 

line, triangle), A 2 (dotted line, circle) and B (full grey line, cross). The black point is the common 
point of the two plots: value of the simulation ^0 for fbe event with base rate vr = 0.5 and a user 
with cost loss ratio a = 0.7. 

applied to the forecast. Indeed, value and potential value are by definition identical if the forecast 
is reliable. 

A parallel between probability value plot and quantile value plot can be draw. In a probability 
value plot, the decision variable is a probability forecast, the base rate tt of the event under focus 
is hxed and the forecast value V is then plotted for a range of cost-loss ratios. The role of a 
and vr are inverted in order to produce a quantile value plot rather than a probability value plot. 
The cost-loss ratio is defined by the quantile probability level and a range of events of interest are 
scanned. It results that the cost-loss ratio of the end-user does not appear explicitly in a quantile 
value plot as is the case for the value plot for probability forecasts. 

The fundamental properties of V are however the same when focusing on one event or on one user. 
These properties (demonstrations can be found e.g. in Richardson, 2011) are recalled here. First, 
the forecast value reaches its maximum when tt = a (or noted differently when vr = 1 — r). For 
instance, a forecast user with a cost-loss ratio of a = 0.1 draws a maximum benefit from a forecast 
if his (her) event of interest has a climatological probability of occurrence of 10%. Secondly, the 
value of a reliable forecasts (full line in Figure 5 (b)) is always greater than the value of the same 
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forecast with biases (dashed and dotted lines in Figure 5 (b)). The value of the reliable forecast 
corresponds to the potential value of the two other datasets. Finally, the potential value is by 
definition always positive. 

5.3 A real example 

The tools introduced for the assessment of quantile forecast discrimination and value are here 
applied to a real dataset. Quantile forecasts of global radiation are derived from COSMO-DE-EPS 
and assessed for two periods of the year 2013. Results for the winter period are shown in Eigure 
6 and results for the summer period in Eigure 7. Quantile discrimination is estimated with the 
area under the RUG curve {AUC) for probability levels r = 0.1,0.2, ...,0.9. A deeper analysis is 
performed for the 10%-, 50%- and 90%-quantile forecasts with the help of quantile value plots. 
The discrimination ability of the EPS quantile forecasts varies as a function of the probability level 
but is greater than 0.80 which can be interpreted as good performance. Eor the winter season, 
discrimination is higher for high and low probability levels than intermediate ones whereas for 
the summer season, discrimination is approximately constant over the probability levels with a 
tendency to decrease for high levels. Inspection of the quantile value plot allows a deeper insight 
into the forecast potential performance. This could be relevant for quantile users with a specific 
interest in only one part of the event spectrum. The potential value of the quantile forecasts is 
plotted as a function of event in terms of the clearness index in % to simplify the reading of the 
plots. However, an event has a different base rate for each season which complicates a direct 
comparison of the quantile value plot in Eigures 6 and 7. 

5.4 Overall value and Quantile Skill Score 

As a final step in drawing a parallel between probability forecast verification and quantile forecast 
verification, the relationship between value and skill score with climatology as a reference is ex¬ 
plored. It has been shown that the overall value of a probability forecast is equivalent to its Brier 
Skill Score (BSS) when the users have a uniform distribution of cost-loss ratio (Murphy, 1969; 
Richardson, 2011). Similarly, we now investigate the relationship between the overall value of a 
quantile forecast and its QSS. 

Eor this purpose, we extend the cost-loss model to more than two observation categories assuming 
that the cost C and the loss L of the cost-loss model are the unitary increment of cost and loss per 
unit of variable, respectively, as discnssed in Section 3.3. Eollowing Richardson (2011), the overall 
value is dehned as the ratio 

Vail = (32) 

where the total mean expense T of a user is estimated when decisions are based on a climatological 
forecast {Tc), on a perfect deterministic forecast (Tp) or on a given forecast (Tp) so that Eq. (32) 
is the extension of Eq. (27) to all possible events. 

The total expense for a perfect deterministic forecast corresponds to the sum of the costs C 
associated with each observation. The total mean expense Tp can then be expressed as 

1 ^ 

= ( 33 ) 

i=l 

Eor a climatological quantile forecast the total expense corresponds to the sum of the costs 
associated with and the losses enconntered when the observations are greater than the clima¬ 
tological forecast (flj > Ht). The total mean expense for a climatological forecast Tc is written as 

1 ^ 1 

= + (34) 

2 = 1 2:Qi>Qr 
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Considering now a sample of quantile forecasts qr^i and the corresponding observations Qi, the 
total expense of a forecasts user corresponds in that case to the sum of the costs associated with 
each forecast qr,i and the losses encountered when flj > q-r^i, given by 

1 ^ 1 

'^F = (35) 

2 = 1 

Combining Eqs (33)-(35), it is shown in the Appendix that the overall value Vaii corresponds to 
QSS (Eq. (8)) with the climatology as a reference based on the assumption of constant cost-loss 
ratio for all outcomes. In other words, extending the dichotomous event-action framework to a 
continuous framework allows one to turn back to the ‘classicab or ‘naturab measure of performance 
for quantile forecast. Conversely, using the dichotomous framework provides the keys to making a 
deeper analysis of the quantile performance at the event level. 



Figure 6: Verification results for COSMO-DE-EPS global radiation forecasts during winter 

2012/2013: quantile discrimination ability {AUC) as a function of the probability level (a), po¬ 
tential value of the 10%-quantile forecast (b), 50%-quantile forecast (c) and 50%-quantile forecast 
(d) as a function of the event of interest defined by thresholds of the clearness index in %. 



Figure 7: Same as Figure 6 but for summer 2013. 


6 Conclusion 

Verification measures and tools related to users’ decision-making are provided here for quantile 
forecasts as decision variables. Drawing a parallel with the verification of probability forecasts, the 
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new verification tools allow the scnite of verification methods for qnantile forecasts to be completed. 
In particular, the concepts of forecast discrimination and forecast value are discussed based on a 
simple cost-loss model. 

First, the RUC curve is shown to be the counterpart of the ROC curve when the focus is on a given 
user rather than on a given event. The areas under the RUC and ROC curves are summary mea¬ 
sures of discrimination adapted to quantile and probability forecasts, respectively. Both measures 
share the same properties, such as non-sensitivity to calibration. 

Second, the translation of discrimination ability into value is explored with the help of the value 
score. The definition of the forecast value is directly adopted from the probability forecast veri¬ 
fication framework. Forecast value and forecast potential value are estimated when the decision 
variable is a quantile forecast, so focusing on a user with a specific cost-loss ratio. The first is 
obtained when the forecast is taken at face value and the second when critical decision criteria are 
applied. The value of a quantile forecast can then be plotted as a function of a range of events 
of interest, defined for example in terms of base rates. The derived plot is called a quantile value 
plot and provides a valuable insight into the performance of a quantile forecast. As a real exam¬ 
ple, the discrimination ability and value of global radiation forecasts from COSMO-DE-EPS are 
demonstrated over a summer and a winter period. 

Finally, it is shown that the overall value of a quantile forecast corresponds to the quantile skill 
score with climatology as reference when a constant cost-loss ratio for all outcomes is assumed. 
In the same spirit as the weighted version of the continuous ranked probability score proposed by 
Gneiting and Ranjan (2011), a weighted version of the quantile skill score could be envisaged in 
order to take into account specific use of quantile forecasts. 
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Appendix 

Overall value and Quantile Skill Score 

Erom Eqs (33) and (34), the difference in expense between climatological and perfect deterministic 
forecasts can be written as 


N 


Tc-rp = ^^C(U,-Ui) + ^ ^ L(Ui-U, 

2 — 1 


(36) 


(J 

Considering the relationship r = 1 —— and setting L equal to 1 in the following demonstration 

Jj 

without loss of generality, we obtain 


N 


Tc - Tp = ^(0, - Ui) + ^ ^ (U,-U0 


N 


2=1 


(37) 


2 • 7 -7" 


and with some algebra 


= (u,-a) + ^ (a-u. 


N 


2 ■ ri 7 -7" 


(38) 


2 ■ ri 7 7" 
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This mean expense difference, Tq — Tp, corresponds to the definition of the quantile score for a 
climatological forecast (Q^dimate)- 

In the same manner, from Eqs (35) and (34), the difference between climatological forecast expense 
and the quantile forecast expense is written as 


Tc-Tp = ^^Cnr + ^ E 

i=l 

1 ^ 1 

“E “ jv ^ ~ 

2=1 


which becomes after some algebra 

(1 -r) 


Tc-Tp=^-j^ E i^r-n,) + - E 


2 ■ 7 -T 


T 

N 


2 ■ 7 x 






2:r2i<gT 




(39) 


(40) 


where the first term corresponds to the definition of the quantile score for a climatological forecast 
(Q'S'dimate) Eq- (38)), and the second term to the quantile score (QS’forecast) Eq. (7)). With regard 
to the definition of the quantile skill score and of the overall value (Eqs (8) and (32), respectively), 
we end up with: 

Vail = QSS (41) 
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